-
Notifications
You must be signed in to change notification settings - Fork 0
Automated Test: authz-service-improve-caching-pr #341
Automated Test: authz-service-improve-caching-pr #341
Conversation
* remove the use of client side cache for in-proc authz client Co-authored-by: Gabriel MABILLE <gabriel.mabille@grafana.com> * add a permission denial cache, fetch perms if not in either of the caches Co-authored-by: Gabriel MABILLE <gabriel.mabille@grafana.com> * Clean up tests Co-authored-by: Ieva <ieva.vasiljeva@grafana.com> * Cache tests Co-authored-by: Ieva <ieva.vasiljeva@grafana.com> * Add test to list + cache Co-authored-by: Ieva <ieva.vasiljeva@grafana.com> * Add outdated cache test Co-authored-by: Ieva <ieva.vasiljeva@grafana.com> * Re-organize metrics Co-authored-by: Ieva <ieva.vasiljeva@grafana.com> --------- Co-authored-by: Gabriel MABILLE <gabriel.mabille@grafana.com>
📝 WalkthroughWalkthroughThe changes introduce a denial caching mechanism to the RBAC authorization service. A new NoopCache type is added for in-process paths, the Service gains a permDenialCache field, and permission checks now leverage cached identity permissions with explicit denial entries to reduce database calls. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Service
participant DenialCache as Denial Cache
participant IdentCache as Identity Cache
participant Database
Client->>Service: Check(permission)
Service->>DenialCache: Get denial entry
alt Denial cached
DenialCache-->>Service: Denied (cached)
Service-->>Client: Allowed: false
else No denial cached
Service->>IdentCache: Get cached permissions
alt Cached permissions hit
IdentCache-->>Service: Permissions (cached)
Service->>Service: Evaluate permission
alt Permission allowed
Service-->>Client: Allowed: true
else Permission denied
Service->>DenialCache: Set denial entry
Service-->>Client: Allowed: false
end
else Cache miss
Service->>Database: Fetch identity permissions
Database-->>Service: Permissions
Service->>IdentCache: Store permissions
Service->>Service: Evaluate permission
alt Permission allowed
Service-->>Client: Allowed: true
else Permission denied
Service->>DenialCache: Set denial entry
Service-->>Client: Allowed: false
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pkg/services/authz/rbac/service.go`:
- Around line 116-137: The permission cache metric is being recorded as a miss
even when getCachedIdentityPermissions(...) returns a cache hit but
checkPermission(...) yields allowed=false; update the logic around
permDenialKey, getCachedIdentityPermissions, and checkPermission so that
metrics.permissionCacheUsage reflects a cache hit in the err==nil branch (e.g.,
call s.metrics.permissionCacheUsage.WithLabelValues("true",
checkReq.Action).Inc() when cachedPerms were found regardless of allowed result,
or add a distinct label like "hit_denied") and only record "false" for real
cache misses outside that branch.
🧹 Nitpick comments (5)
pkg/services/authz/rbac/cache.go (1)
30-32: Cache key collision risk with_separator.The denial cache key shares the same prefix format as
userPermCacheKey(namespace + ".perm_" + userUID + "_" + action), withnameandparentappended using the same_delimiter. If any field value ever contains_, keys could collide—both across the two key types and within denial keys themselves (e.g.,name="a_b", parent="c"vsname="a", parent="b_c").Current resource names/UIDs likely don't contain
_, so the practical risk is low, but consider using a more robust separator (e.g.,/or a multi-char delimiter) to make the key scheme collision-proof.💡 Possible fix
func userPermDenialCacheKey(namespace, userUID, action, name, parent string) string { - return namespace + ".perm_" + userUID + "_" + action + "_" + name + "_" + parent + return namespace + ".perm_denial_" + userUID + "/" + action + "/" + name + "/" + parent }Using a distinct prefix (
perm_denial_vsperm_) and a different separator (/) eliminates any ambiguity withuserPermCacheKey.pkg/services/authz/rbac.go (1)
239-251: Receiver namelcis misleading forNoopCache.The receiver is named
lc(likely copied from a LocalCache pattern), but this is aNoopCache. Consider renaming toncorcfor consistency.💡 Proposed fix
-func (lc *NoopCache) Get(ctx context.Context, key string) ([]byte, error) { +func (nc *NoopCache) Get(ctx context.Context, key string) ([]byte, error) { return nil, cache.ErrNotFound } -func (lc *NoopCache) Set(ctx context.Context, key string, data []byte, exp time.Duration) error { +func (nc *NoopCache) Set(ctx context.Context, key string, data []byte, exp time.Duration) error { return nil } -func (lc *NoopCache) Delete(ctx context.Context, key string) error { +func (nc *NoopCache) Delete(ctx context.Context, key string) error { return nil }pkg/services/authz/rbac/service.go (2)
342-368:getCachedIdentityPermissionsperforms a DB call for user identifiers, partially negating the "cache-only" intent.For
TypeUser/TypeServiceAccount,GetUserIdentifiers(line 356) may hit the database if the identity isn't inidCache. This means the "fast cached path" can still trigger a DB round-trip just to construct the perm-cache key, even if the perm cache itself turns out to be empty. TheidCachehas a longer TTL (2 min) so this is mostly mitigated after the first request, but on a cold cache or after TTL expiry, everyCheck/Listcall pays this cost upfront.If this is intentional, a brief code comment explaining the tradeoff would help future maintainers.
153-155: Denial cache entries may outlive the permission cache, creating a brief stale-denial window.Both
permCacheandpermDenialCacheuseshortCacheTTL(30s), but the denial entry is written after the perm entry (which is set insidegetUserPermissions). When a permission is granted in the DB shortly after a denial is cached, the denial entry can outlive the perm entry by the delta between the two writes, causing the fast-path at line 117 to return a stale denial even afterpermCacherefreshes.In practice this delta is small (milliseconds), and the 30s TTL makes any staleness transient. However, it's worth documenting this known tradeoff (or setting the denial TTL slightly shorter than the perm TTL) so operators understand the behavior during permission changes.
pkg/services/authz/rbac/service_test.go (1)
973-995: Test doesn't conclusively prove the denial cache short-circuits.The
permCacheentry has"dashboards:uid:dash1": false, which would also result in a denial through the normal permission-check path. If the denial cache were removed, the test would still pass (via the perm check + DB fallback path). To truly verify that the denial cache takes priority, set a permCache value that would allow the request:- s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": false}) + s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": true})This way, if the denial cache is bypassed,
checkPermissionwould returnallowed=true, and the test would fail—proving the denial cache is the decisive factor.
| permDenialKey := userPermDenialCacheKey(checkReq.Namespace.Value, checkReq.UserUID, checkReq.Action, checkReq.Name, checkReq.ParentFolder) | ||
| if _, ok := s.permDenialCache.Get(ctx, permDenialKey); ok { | ||
| s.metrics.permissionCacheUsage.WithLabelValues("true", checkReq.Action).Inc() | ||
| s.metrics.requestCount.WithLabelValues("false", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc() | ||
| return &authzv1.CheckResponse{Allowed: false}, nil | ||
| } | ||
|
|
||
| cachedPerms, err := s.getCachedIdentityPermissions(ctx, checkReq.Namespace, checkReq.IdentityType, checkReq.UserUID, checkReq.Action) | ||
| if err == nil { | ||
| allowed, err := s.checkPermission(ctx, cachedPerms, checkReq) | ||
| if err != nil { | ||
| ctxLogger.Error("could not check permission", "error", err) | ||
| s.metrics.requestCount.WithLabelValues("true", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc() | ||
| return deny, err | ||
| } | ||
| if allowed { | ||
| s.metrics.permissionCacheUsage.WithLabelValues("true", checkReq.Action).Inc() | ||
| s.metrics.requestCount.WithLabelValues("false", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc() | ||
| return &authzv1.CheckResponse{Allowed: allowed}, nil | ||
| } | ||
| } | ||
| s.metrics.permissionCacheUsage.WithLabelValues("false", checkReq.Action).Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading cache-usage metric when cache hits but check denies.
When getCachedIdentityPermissions returns cached permissions (cache hit) but checkPermission returns allowed=false, execution falls through to line 137 which unconditionally records a cache miss ("false"). The cache was found and consulted; the data simply didn't produce an allow. This inflates the cache-miss rate and deflates the cache-hit rate in monitoring, making it harder to assess actual cache effectiveness.
Consider recording the metric inside the if err == nil branch to reflect that the cache was hit, or use a distinct label (e.g., "hit_denied") to differentiate this path from a true cache miss.
💡 Proposed fix
cachedPerms, err := s.getCachedIdentityPermissions(ctx, checkReq.Namespace, checkReq.IdentityType, checkReq.UserUID, checkReq.Action)
if err == nil {
allowed, err := s.checkPermission(ctx, cachedPerms, checkReq)
if err != nil {
ctxLogger.Error("could not check permission", "error", err)
s.metrics.requestCount.WithLabelValues("true", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc()
return deny, err
}
if allowed {
s.metrics.permissionCacheUsage.WithLabelValues("true", checkReq.Action).Inc()
s.metrics.requestCount.WithLabelValues("false", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc()
return &authzv1.CheckResponse{Allowed: allowed}, nil
}
+ // Cache was present but didn't grant access — still a cache hit (just not conclusive for allow)
+ s.metrics.permissionCacheUsage.WithLabelValues("false", checkReq.Action).Inc()
+ } else {
+ s.metrics.permissionCacheUsage.WithLabelValues("false", checkReq.Action).Inc()
}
- s.metrics.permissionCacheUsage.WithLabelValues("false", checkReq.Action).Inc()🤖 Prompt for AI Agents
In `@pkg/services/authz/rbac/service.go` around lines 116 - 137, The permission
cache metric is being recorded as a miss even when
getCachedIdentityPermissions(...) returns a cache hit but checkPermission(...)
yields allowed=false; update the logic around permDenialKey,
getCachedIdentityPermissions, and checkPermission so that
metrics.permissionCacheUsage reflects a cache hit in the err==nil branch (e.g.,
call s.metrics.permissionCacheUsage.WithLabelValues("true",
checkReq.Action).Inc() when cachedPerms were found regardless of allowed result,
or add a distinct label like "hit_denied") and only record "false" for real
cache misses outside that branch.
This pull request was automatically created by
@coderabbitai/e2e-reviewer.Batch created pull request.
Summary by CodeRabbit
Refactor
Tests